Text Typology and Statistics Explorations in Italian press subgenres
نویسنده
چکیده
According to Biber’s definition, text types are represented by groupings of texts which are similar in their linguistic form, while genre categories are assigned on the basis of use. It is important to stress that texts from a single genre might be classified into different text types. In the present research, an experiment will be carried out to single out different text typologies among Italian press subgenres only on the basis of morphosyntactic features. The approach will be corpus-based and the aim merely exploratory. A virtually random sample will be extracted from two Italian tagged corpora , the sample divided into 22 files, a file for each subgenre, and 41 morphosyntactic features will be counted per article within each subgenre. The raw frequencies will be normalised. The dataset built from the normalised frequencies (quantitative variables) will be submitted to Descriptive Statistics and Factor Analysis. Descriptive statistics give information about the distribution of the variables. It is a preliminary step in any exploration and gives a better understanding of the set of data. Factor analysis studies the correlations among a large number of interrelated quantitative variables by grouping these variables into a small number of factors, which help to understand the structure of correlations or the underlying construct. The aim of this research is to investigate to which extent statistical techniques can help in classifying texts in a steady and reliable way.
منابع مشابه
Italian Political Communication and Gender Bias: Press Representations of Men/Women Presidents of the Houses of Parliament (1979, 1994, and 2013)
The study considers mass media communication as intertwined with social norms, as assumed by the perspective of social representations. It explores the Italian press communication by focusing on three pairs of men and women politicians with different political orientations and all serving as presidents of the Houses of Parliament in three legislatures. The article concentrates on five newspaper...
متن کاملDesign and Annotation of the First Italian Corpus for Text Simplification
In this paper, we present design and construction of the first Italian corpus for automatic and semi–automatic text simplification. In line with current approaches, we propose a new annotation scheme specifically conceived to identify the typology of changes an original sentence undergoes when it is manually simplified. Such a scheme has been applied to two aligned Italian corpora, containing o...
متن کاملPaciic Association for Computational Linguistics toward Pragmatic Use of Evaluation Resources
Natural language processing requires more and more voluminous textual corpora, and lots of evaluations are led in this eld. So, from the viewpoint of improving heterogeneous corpora (unrestricted texts) morpho-syntactic tagging, an adaptive approach to the type of text using resources provided by an evaluation will be presented. To justify this approach, basic statistics are applied on French M...
متن کاملKnowledge Intensive Word Alignment with KNOWA
In this paper we present KNOWA, an English/Italian word aligner, developed at ITC-irst, which relies mostly on information contained in bilingual dictionaries. The performances of KNOWA are compared with those of GIZA++, a state of the art statistics-based alignment algorithm. The two algorithms are evaluated on the EuroCor and MultiSemCor tasks, that is on two English/Italian publicly availabl...
متن کاملWhose job instability affects the likelihood of becoming a parent
We examine the likelihood of becoming a parent in Italy taking into account the employment (in)stability of both partners in a couple. We use data from four waves of the Italian section of the EU-SILC (Statistics on Income and Living Condition), 20042007, accounting for its longitudinal nature. Overall, our results suggest that Italian couples are neither fully traditional nor entirely modern: ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003